{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Update the draw probabilities for the final draw in section `4.5.3.`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As it is mencioned in the 4.5.3 numerical example, the first number drawn and included in $\\phi$ is 2 from the set {1, 2, 3} correponding to the columns heading  $\\to \\phi^{1}=[2]$. This number selection was carried out with a uniform probability of $\\delta_{i}=\\frac{1}{3}$.\n",
    "\n",
    "To calculate the probabilities of the next draw it is necessary to calculate the average uniqueness of the features (or possible numbers) taking into account the previous draw.\n",
    "\n",
    "It is important to mention that the set of the numbers to be drawn {1, 2, 3} corresponds to the columns heading [0, 1, 2] of the matrix indM.\n",
    "\n",
    "Before presenting the calculation let's summarize the mathematical expressions composing the average uniqueness:\n",
    "\n",
    "$\\bar{u}_i = (\\sum_{t=1}^{T}u_{t,1})(\\sum_{t=1}^{T}1_{t,i})^{-1}$\n",
    "\n",
    "$u_{t,i}=1_{t,i} c^{-1}$\n",
    "\n",
    "$c = \\sum_{i=1}^{I} 1_{t,i}$\n",
    "\n",
    "then,   \n",
    "$u_{t,i}=1_{t,i}\\sum_{i=1}^{I} 1_{t,i}$\n",
    "\n",
    "\n",
    "Knowing that $indM = \\begin{bmatrix}\n",
    "                     \\underline{\\mathbf{\\color{red}1}} & \\underline{\\mathbf{\\color{red}2}} & \\underline{\\mathbf{\\color{red}3}}\\\\\n",
    "                      1 & 0 & 0 \\\\\n",
    "                      1 & 0 & 0 \\\\\n",
    "                      1 & 1 & 0 \\\\\n",
    "                      0 & 1 & 0 \\\\\n",
    "                      0 & 0 & 1 \\\\\n",
    "                      0 & 0 & 1 \n",
    "                      \\end{bmatrix}$\n",
    "\n",
    "\n",
    "The numbers of the example developed in the book comes from:\n",
    "\n",
    ">$\\bar{u}_{i}^{(2)} = (1 + 1 + 1) \\frac{1}{3}$\n",
    "\n",
    "where \n",
    "\n",
    "> $\\bar{u}_{i}^{(2)} = (\\sum_{t=1}^{T}u_{t,1})(\\sum_{t=1}^{T}1_{t,i})^{-1}$\n",
    "\n",
    "> $u_{t,i}^{(2)}=1_{t,i}(1+\\sum_{k\\in\\phi^{(1)}}1_{t,k})^{-1}$\n",
    "\n",
    "\n",
    "$u_{t,1}^{(2)}=\\begin{bmatrix}1\\\\1\\\\1\\\\0\\\\0\\\\0\\end{bmatrix} (1+\\sum_{k\\in\\phi^{(1)}} \\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix})^{-1} = \\begin{bmatrix}1\\\\1\\\\1\\\\0\\\\0\\\\0\\end{bmatrix} (1+\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix})^{-1} = \\begin{bmatrix}1\\\\1\\\\1\\\\0\\\\0\\\\0\\end{bmatrix}\\div\\begin{bmatrix}1\\\\1\\\\2\\\\2\\\\1\\\\1\\end{bmatrix} = \\begin{bmatrix}1\\\\1\\\\\\frac{1}{2}\\\\0\\\\0\\\\0\\end{bmatrix}$\n",
    "\n",
    "With this value we can calculate the average uniqueness of the first feature:\n",
    "\n",
    ">$\\bar{u}_i = (\\sum_{t=1}^{T}u_{t,1})(\\sum_{t=1}^{T}1_{t,i})^{-1}$\n",
    "\n",
    ">$\\bar{u}_{1}^{(2)}=(1 + 1 + \\frac{1}{2})\\div(1 + 1 + 1)=\\frac{\\frac{5}{2}}{3}=\\frac{5}{6}$\n",
    "                \n",
    "    \n",
    "    \n",
    "**feature 2:**\n",
    "\n",
    "$u_{t,2}^{(2)}=\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix} (1+\\sum_{k\\in\\phi^{(1)}} \\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix})^{-1} = \\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix} (1+\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix})^{-1} = \\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix}\\div\\begin{bmatrix}1\\\\1\\\\2\\\\2\\\\1\\\\1\\end{bmatrix} = \\begin{bmatrix}0\\\\0\\\\\\frac{1}{2}\\\\\\frac{1}{2}\\\\0\\\\0\\end{bmatrix}$\n",
    "\n",
    "                \n",
    ">$\\bar{u}_{2}^{(2)}=(\\frac{1}{2} + \\frac{1}{2})\\div(1 + 1)=\\frac{1}{2}$\n",
    "                                \n",
    "\n",
    "**feature 3:**\n",
    "\n",
    "$u_{t,3}^{(2)}=\\begin{bmatrix}0\\\\0\\\\0\\\\0\\\\1\\\\1\\end{bmatrix} (1+\\sum_{k\\in\\phi^{(1)}} \\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix})^{-1} = \\begin{bmatrix}0\\\\0\\\\0\\\\0\\\\1\\\\1\\end{bmatrix} (1+\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix})^{-1} = \\begin{bmatrix}0\\\\0\\\\0\\\\0\\\\1\\\\1\\end{bmatrix}\\div\\begin{bmatrix}1\\\\1\\\\2\\\\2\\\\1\\\\1\\end{bmatrix} = \\begin{bmatrix}0\\\\0\\\\0\\\\0\\\\1\\\\1\\end{bmatrix}$\n",
    "\n",
    "                \n",
    ">$\\bar{u}_{3}^{(2)}=(1 + 1)\\div(1 + 1)=1$\n",
    "                                \n",
    "\n",
    "Having calculated all the average uniqueness values the next step is to calculate the probability:\n",
    "\n",
    ">$\\delta_{j}^{(2)}=\\bar{u}_{j}^{(2)}(\\sum_{k=1}^{I}\\bar{u}_{k}^{(2)})^{-1}$\n",
    "\n",
    ">$\\sum_{k=1}^{I}\\bar{u}_{k}^{(2)}=\\frac{5}{6}+\\frac{1}{2}+1=\\frac{5}{6}+\\frac{3}{6}+\\frac{6}{6}=\\frac{14}{6}=\\frac{7}{3}$\n",
    "\n",
    ">$\\delta_{1}^{(2)}=\\frac{\\frac{5}{6}}{\\frac{7}{3}}=\\frac{5}{14}=0.3571$\n",
    "\n",
    ">$\\delta_{2}^{(2)}=\\frac{\\frac{1}{2}}{\\frac{7}{3}} = \\frac{3}{14}=0.2143$\n",
    "\n",
    ">$\\delta_{3}^{(2)}=\\frac{1}{\\frac{7}{3}}=\\frac{3}{7}=\\frac{6}{14}=0.4286$\n",
    "\n",
    ">$\\sum\\delta^{(2)}=\\frac{5}{14}+\\frac{3}{14}+\\frac{6}{14}=\\frac{14}{14}=1$\n",
    "\n",
    "Notes from the book:\n",
    "\n",
    "1) The feature already picked gets the lowest probability due to the overlap with itself would be highest. \n",
    "\n",
    "2) The third feature gets the highest probability because it has no overlap with the second feature which was the picked one. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The previous results can be obtained with the following code: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t1=pd.Series([2,3,5],index=[0,2,4])\n",
    "barIx=range(t1.max()+1) # index of bars\n",
    "indM=getIndMatrix(barIx,t1)\n",
    "indM     \n",
    "# The columns heading correspond to the feature set {1, 2, 3} respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "phi = [1]        # phi = [1] corresponds to column 1, feature 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Following statements are taken from the SNIPPET 4.5 - function seqBootstrap(indM,sLength=None)\n",
    "\n",
    "avgU=pd.Series()\n",
    "for i in indM:\n",
    "    indM_=indM[phi+[i]] # reduce indM\n",
    "    avgU.loc[i]=getAvgUniqueness(indM_).iloc[-1]\n",
    "\n",
    "print('Average Uniqueness: \\n',avgU)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prob1 = avgU/avgU.sum()\n",
    "print('Feature draw probabilities: \\n', prob1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The numerical example continues selecting 3 as the second draw.\n",
    "\n",
    "$\\phi = [2, 3]$\n",
    "\n",
    "**first feature:**\n",
    "\n",
    "$u_{t,1}=1_{t,1}(1+\\sum_{k\\in\\phi}1_{t,k})^{-1}$\n",
    "\n",
    "$u_{t,1}=\\begin{bmatrix}1\\\\1\\\\1\\\\0\\\\0\\\\0\\end{bmatrix}\\times(1+\\sum\\begin{bmatrix}0 & 0\\\\0 & 0\\\\1 & 0\\\\1 & 0\\\\0 & 1\\\\0 & 1\\end{bmatrix})^{-1}$\n",
    "\n",
    "$u_{t,1}=\\begin{bmatrix}1\\\\1\\\\1\\\\0\\\\0\\\\0\\end{bmatrix}\\times(1+\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\1\\\\1\\end{bmatrix})^{-1}=\\begin{bmatrix}1\\\\1\\\\1\\\\0\\\\0\\\\0\\end{bmatrix}\\times\\begin{bmatrix}1\\\\1\\\\2\\\\2\\\\2\\\\2\\end{bmatrix}^{-1}=\\begin{bmatrix}1\\\\1\\\\\\frac{1}{2}\\\\0\\\\0\\\\0\\end{bmatrix}$\n",
    "\n",
    "$\\bar{u}_{1}^{(2,3)}=(\\sum_{t=1}^{T}u_{t,1})(\\sum_{t,1}^{T}1_{t,1})^{-1} = \\frac{(1+1+\\frac{1}{2})}{(1+1+1)}=\\frac{\\frac{2}{2}+\\frac{2}{2}+\\frac{1}{2}}{3}$\n",
    "\n",
    "$\\bar{u}_{1}^{(2,3)} = \\frac{5}{6}$\n",
    "\n",
    "**second feature:**\n",
    "\n",
    "$u_{t,2}=1_{t,2}(1+\\sum_{k\\in\\phi}1_{t,k})^{-1}$\n",
    "\n",
    "$u_{t,2}=\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix}\\times(1+\\sum\\begin{bmatrix}0 & 0\\\\0 & 0\\\\1 & 0\\\\1 & 0\\\\0 & 1\\\\0 & 1\\end{bmatrix})^{-1}$\n",
    "\n",
    "$u_{t,2}=\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix}\\times(1+\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\1\\\\1\\end{bmatrix})^{-1}=\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\0\\\\0\\end{bmatrix}\\times\\begin{bmatrix}1\\\\1\\\\2\\\\2\\\\2\\\\2\\end{bmatrix}^{-1}=\\begin{bmatrix}0\\\\0\\\\\\frac{1}{2}\\\\\\frac{1}{2}\\\\0\\\\0\\end{bmatrix}$\n",
    "\n",
    "$\\bar{u}_{2}^{(2,3)}=(\\sum_{t=1}^{T}u_{t,2})(\\sum_{t,1}^{T}1_{t,2})^{-1} = \\frac{(\\frac{1}{2}+\\frac{1}{2})}{(1+1)}=\\frac{1}{2}$\n",
    "\n",
    "**third feature:**\n",
    "\n",
    "$u_{t,3}=1_{t,3}(1+\\sum_{k\\in\\phi}1_{t,k})^{-1}$\n",
    "\n",
    "$u_{t,3}=\\begin{bmatrix}0\\\\0\\\\0\\\\0\\\\1\\\\1\\end{bmatrix}\\times(1+\\sum\\begin{bmatrix}0 & 0\\\\0 & 0\\\\1 & 0\\\\1 & 0\\\\0 & 1\\\\0 & 1\\end{bmatrix})^{-1}$\n",
    "\n",
    "$u_{t,3}=\\begin{bmatrix}0\\\\0\\\\0\\\\0\\\\1\\\\1\\end{bmatrix}\\times(1+\\begin{bmatrix}0\\\\0\\\\1\\\\1\\\\1\\\\1\\end{bmatrix})^{-1}=\\begin{bmatrix}0\\\\0\\\\0\\\\0\\\\1\\\\1\\end{bmatrix}\\times\\begin{bmatrix}1\\\\1\\\\2\\\\2\\\\2\\\\2\\end{bmatrix}^{-1}=\\begin{bmatrix}0\\\\0\\\\0\\\\0\\\\\\frac{1}{2}\\\\\\frac{1}{2}\\end{bmatrix}$\n",
    "\n",
    "$\\bar{u}_{3}^{(2,3)}=(\\sum_{t=1}^{T}u_{t,3})(\\sum_{t,1}^{T}1_{t,3})^{-1} = \\frac{(\\frac{1}{2}+\\frac{1}{2})}{(1+1)}=\\frac{1}{2}$\n",
    "\n",
    "Once the average uniqueness of all features has been calculated the next step is to get the probabilities.\n",
    "\n",
    ">$\\delta_{j}^{(2,3)}=\\bar{u}_{j}^{(2,3)}(\\sum_{k=1}^{(2,3)})^{-1}$\n",
    "\n",
    ">$\\delta_{j}^{(2,3)}=\\frac{(\\frac{5}{6}, \\frac{1}{2}, \\frac{1}{2})}{(\\frac{5}{6}+\\frac{1}{2}+\\frac{1}{2})}=\\frac{(\\frac{5}{6}, \\frac{1}{2}, \\frac{1}{2})}{\\frac{11}{6}}$\n",
    "\n",
    ">$\\delta_{j}^{(2,3)}=(\\frac{5}{11}, \\frac{3}{11}, \\frac{3}{11})$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The previous results can be obtained with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "phi = [1,2]      # These values corresponds to the indM columns or features [2,3]."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Following statements are taken from the SNIPPET 4.5 - function seqBootstrap(indM,sLength=None)\n",
    "avgU=pd.Series()\n",
    "for i in indM:\n",
    "    indM_=indM[phi+[i]] # reduce indM\n",
    "    avgU.loc[i]=getAvgUniqueness(indM_).iloc[-1]\n",
    "\n",
    "print('Average Uniqueness: \\n',avgU)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prob2 = avgU/avgU.sum()\n",
    "print('Feature draw probabilities: \\n', prob2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The highest probability corresponds to the feature with the least overlap due to it has not been selected."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In Section `4.5.3` suppose that number 2 is picked again in the second draw. Waht would be the updated probabilities for the third draw?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Number 2 is located in column 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "phi = [1,1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "avgU=pd.Series()\n",
    "for i in indM:\n",
    "    indM_=indM[phi+[i]] # reduce indM\n",
    "    avgU.loc[i]=getAvgUniqueness(indM_).iloc[-1]\n",
    "\n",
    "print('Average Uniqueness: \\n',avgU)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prob3 = avgU/avgU.sum()\n",
    "print('Feature draw probabilities: \\n', prob3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
